Skip to content

Conversation

@rabintiwari45
Copy link

@rabintiwari45 rabintiwari45 commented Jun 8, 2025

This PR introduces support for patching AutoModelForSequenceClassification within the FastModel.from_pretrained() interface. It enables the following usage pattern:

from unsloth import FastModel
from transformers import AutoModelForSequenceClassification

model, tokenizer = FastModel.from_pretrained(
    auto_model = AutoModelForSequenceClassification,
)

Changes Included
Added patching logic for AutoModelForSequenceClassification to enable compatibility with FastModel.

Updated the finetuner to allow training with sequence classification models.

Modified unsloth_zoo to gracefully handle weights that do not have a quant_state attribute:

# Check if quant_state exists
if not hasattr(weight, 'quant_state'):
    print(f"Skipping {name}: no quant_state found")
    continue

Notes
While the patch works as intended in current testing, there may be edge cases or integration concerns that require further review.

Please verify if any additional logic or edge handling is needed in related modules.

This pr is linked to #165 in unsloth_zoo

@pluesclues
Copy link
Collaborator

hey @rabintiwari45 I also happened to write a pr for sequence classification: #2710, but I noticed that your PR is mainly for VLMs? correct me if I am wrong? My patch does look a bit shorter but please let me know if I am missing anything? We probably can incorporate both patches somehow.

@rabintiwari45
Copy link
Author

Hi @pluesclues
Yes, I’ve mostly implemented it for VLM. I structured it in a way that allows us to use both the fast model and AutoModelForSequenceClassification simultaneously, as I wasn't fully aware of the broader requirements for supporting sequence classification. We can definitely merge the patches.

@pluesclues
Copy link
Collaborator

@Etherll I believe has a way of loading any type of model into fast model as well? There maybe an easier way to load the models that you want as the PR is quite extensive but I was having a bit of trouble finding the code for it.

Copy link
Collaborator

@Datta0 Datta0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rabintiwari45
Thanks a lot and kudos to these changes. Added a few minor comments for now...

return_dict = return_dict if return_dict is not None else self.config.use_return_dict

# Get outputs from the language model part only (ignore vision for sequence classification)
language_model_outputs = self.llava_next.language_model(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielhanchen do you think we should have a forward function creator which takes in language_model as input and performs all this
This code is similar between both the models.

So we can do LlavaNextForSequenceClassification.forward = create_forward(self.llava_next.language_model) and MllamaForSequenceClassification.forward = create_forward(self.mllama.language_model,)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @danielhanchen
when you have a moment, could you please take a look at this PR? I'd really appreciate your feedback.

Copy link
Collaborator

@Datta0 Datta0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants